Data mining techniques must be developed and applied to analyse the large public data bases containing hundreds to thousands of millions entries. The aim of this study is to develop methods for locating previously unknown stellar clusters from the UKIDSS Galactic Plane Survey catalogue data. The cluster candidates are computationally searched from pre-filtered catalogue data using a method that fits a mixture model of Gaussian densities and background noise using the Expectation Maximization algorithm. The catalogue data contains a significant number of false sources clustered around bright stars. A large fraction of these artefacts were automatically filtered out before or during the cluster search. The UKIDSS data reduction pipeline tends to classify marginally resolved stellar pairs and objects seen against variable surface brightness as extended objects (or "galaxies" in the archive parlance). 10% or 66 x 10^6 of the sources in the UKIDSS GPS catalogue brighter than 17 magnitudes in the K band are classified as "galaxies". Young embedded clusters create variable NIR surface brightness because the gas/dust clouds in which they were formed scatters the light from the cluster members. Such clusters appear therefore as clusters of "galaxies" in the catalogue and can be found using only a subset of the catalogue data. The detected "galaxy clusters" were finally screened visually to eliminate the remaining false detections due to data artefacts. Besides the embedded clusters the search also located locations of non clustered embedded star formation. The search covered an area of 1302 square degrees and 137 previously unknown cluster candidates and 30 previously unknown sites of star formation were found.